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Abstract 

Existing works on distributed consensus explore linear iterations based on reversible Markov chains, 
which contribute to the slow convergence of the algorithms. It has been observed that by overcoming 
the diffusive behavior of reversible chains, certain nonreversible chains lifted from reversible ones 
mix substantially faster than the original chains. In this paper, we investigate the idea of accelerating 
distributed consensus via lifting Markov chains, and propose a class of Location-Aided Distributed 
Averaging (LADA) algorithms for wireless networks, where nodes' coarse location information is used 
to construct nonreversible chains that facilitate distributed computing and cooperative processing. First, 
two general pseudo-algorithms are presented to illustrate the notion of distributed averaging through 
chain-lifting. These pseudo-algorithms are then respectively instantiated through one LADA algorithm 
on grid networks, and one on general wireless networks. For a fc x fc grid network, the proposed LADA 
algorithm achieves an e-averaging time of 0(fc log(e^^)). Based on this algorithm, in a wireless network 
with transmission range r, an e-averaging time of 0{r^^ log(e^^)) can be attained through a centralized 
algorithm. Subsequently, we present a fully-distributed LADA algorithm for wireless networks, which 
utilizes only the direction information of neighbors to construct nonreversible chains. It is shown that this 
distributed LADA algorithm achieves the same scaling law in averaging time as the centralized scheme 
in wireless networks for all r satisfying the connectivity requirement. The constructed chain attains the 
optimal scaling law in terms of an important mixing metric, the fill time, among all chains lifted from 
one with an approximately uniform stationary distribution on geometric random graphs. Finally, we 
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propose a cluster-based LADA (C-LADA) algorithm, which, requiring no central coordination, provides 
the additional benefit of reduced message complexity compared with the distributed LADA algorithm. 

Index Terms 

Clustering, Distributed Computation, Distributed Consensus, Message Complexity, Mixing Time, 
Nonreversible Markov Chains, Time Complexity 

I. Introduction 

As a basic building block for networked information processing, distributed consensus admits many 
important applications in various areas, such as distributed estimation and data fusion, coordination and 
cooperation of autonomous agents, as well as network optimization. The distributed averaging problem 
where nodes try to reach consensus on the average value' through iterative local information exchange has 
been vigorously investigated recently [l]-[6]. Compared with centralized counterparts, such distributed 
algorithms scale well as the network grows, and exhibit robustness to node and link failures. Distributed 
consensus can be realized through linear iteration in the form x(t + 1) = W(t)x(t) where W(t) is a 
graph conformant matrix^. Distributed averaging through Unear iteration with a deterministic W is studied 
in [1]. For time-varying W(t), convergence is guaranteed under mild conditions [2], [3]. The class of 
randomized gossip algorithms recently studied by Boyd et al [4], [5] realizes consensus through iterative 
pairwise averaging, and allows for asynchronous operation. In their study, independent and identically 
distributed random matrices W(t) are considered, and performance of the proposed algorithms is governed 
by the second largest eigenvalue of E [W(t)]. 

Typically, governing matrices in distributed consensus algorithms are chosen to be stochastic, which 
connects them closely to Markov chain theory. It is also convenient to view the evolvement of a Markov 
chain P as a random walk on a graph (with vertex set V being the state space of the chain, and edge set 
E = {uv : Puv > 0}). In both fixed and random algorithms studied in [1], [4], [5], mainly a symmetric, 
doubly stochastic weight matrix is used, hence the convergence time of such algorithms is closely related 
to the mixing time of a reversible random walk, which is usually slow due to its diffusive behavior. It 
has been shown in [5] that in a wireless network of size n with a common transmission range r, the 

'with appropriate modification, such algorithms can also be extended to computation of weighted sums, linear synopses, 
histograms and types, and can address a large class of distributed computing and statistical inferencing problems. 

^For a graph G = {V, E) with the vertex set V and edge set E, a matrix W of size \V\ x \V\ is G-conformant, if Wij ^ 
only if € E. 
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optimal gossip algorithm requires B (r^^ log(e^^))^ time for the relative error to be bounded by e. This 
means that for a small radius of transmission, even the fastest gossip algorithm converges slowly. 

Reversible Markov chains are dominant in research literature, as they are mathematically more tractable 
- see [7] and references therein. However, it is observed by Diacoius et al. [8] and later by Chen et 
al. [9] that certain nonreversible chains mix substantially faster than corresponding reversible chains, by 
overcoming the diffusive behavior of reversible random walks. Our work is directly motivated by this 
finding, as well as the close relationship between distributed consensus algorithms and Markov chains. 
We first show that by allowing each node in a network to maintain multiple values, mimicking the 
multiple lifted states from a single state, a nonreversible chain on a lifted state space can be simulated, 
and we present two general pseudo-algorithms for this purpose. The next and more challenging step is to 
explicitly construct fast-mixing non-reversible chains given the network graphs. In this work, we propose 
a class of Location- Aided Distributed Averaging (LADA) algorithms that result in significantly improved 
averaging times compared with existing algorithms. As the name implies, the algorithms utiUze (coarse) 
location information to construct nonreversible chains that prevent the same information being "bounced" 
forth and back, thus accelerating information dissemination. 

Two important types of networks, grid networks and general wireless networks modeled by geometric 
random graphs, are considered in this work. For & k x k grid, we propose a LADA algorithm as an 
application of our Pseudo-Algorithm 1, and show that it takes 0(A;log(e~^)) time to reach a relative 
error within e. Then, for the celebrated geometric random graph G(n, r) with a common transmission 
range r, we present a centrahzed grid-based algorithm which exploits the LADA algorithm on the grid 
to achieve an e-averaging time of 0(r~^ log(e~^)). 

In practice, purely distributed algorithms requiring no central coordination are typically preferred. 
Consequently, we propose a fully-distributed LADA algorithm, as an instantiation of Pseudo- Algorithm 
2. On a wireless network with randomly distributed nodes, the constructed chain does not possess a 
uniform stationary distribution desirable for distributed averaging, due to the difference in the number 
of neighbors a node has in different directions. Nevertheless, we show that the non-uniformity for the 
stationary distribution can be compensated by weight variables which estimate the stationary probabilities, 
and that the algorithm achieves an e-averaging time of 0{r~^ log(e~^)) with any transmission range r 

^We use the following order notations in this paper: Let /(n) and g{n) be nonnegative functions for n > 0. We say 
/(n) = 0{g{n)) and g{n) = f2(/(n)) if there exists some k and c > 0, such that f{n) < cg{n) for n> k; f{ri) = Q{g{n)) 
if /(n) = 0{g{n)) as well as /(n) = n{g{n)). We also say /(n) = o{g{n}) and g{n} = uj{f{n)) if lim„^oo ^ = 0. 
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guaranteeing network connectivity. Although it is not known whether the achieved averaging time is 
optimal for all e, we demonstrate that the constructed chain does attain the optimal scaling law in terms 
of another mixing metric Tfiu(P, c) (c.f. (3)), among all chains lifted from one with an approximately 
(on the order sense) uniform stationary distribution on G{n,r). In Appendix C, we provide another 
algorithm, the LADA-U algorithm, where the nonreversible chain is carefully designed to ensure an 
exact uniform stationary distribution (which accounts for the suffix "U"), by allowing some controlled 
diffusive behavior. It is shown that LADA-U can achieve the same scaling law in averaging time as 
the centralized and distributed LADA algorithm, but needs a larger transmission range than minimum 
connectivity requirement, mainly due to the induced diffusive behavior. 

Finally, we propose a cluster-based LADA (C-LADA) variant to further improve on the message 
complexity. This is motivated by the common assumption that nodes in some networks, such as wireless 
sensor networks, are densely deployed, where it is often more efficient to have co-located nodes clustered, 
effectively behaving as a single entity. In this scenario, after initiation, only inter-cluster communication 
and intra-cluster broadcast are needed to update the values of all nodes. Different from the centralized 
algorithm, clustering is performed through a distributed clustering algorithm; the induced graph is usually 
not a grid, so the distributed LADA algorithm, rather than the grid-based one, is suitably modified and 
applied. The same time complexity as LADA is achieved, but the number of messages per iteration is 
reduced from 6(n) to 6(r~^). 

In this paper, for ease of exposition we focus on synchronous algorithms without gossip constraints, i.e., 
in each time slot, every node updates its values based on its neighbors' values in the previous iteration. 
Nonetheless, these algorithms can also be realized in a deterministic gossip fashion, by simulating at most 
dmax matchings for each iteration, where c/max is the maximum node degree. Also note that while most 
of our analysis is conducted on the geometric random graph, the algorithms themselves can generally be 
applied on any network topology. 

Our paper is organized as follows. In Section n, we formulate the problem and review some important 
results in Markov chain theory. In Section m, we introduce the notion of lifting Markov chains and 
present two pseudo-algorithms for distributed consensus based on chain-lifting. In Section IV, the LADA 
algorithm for grid networks is proposed, which is then extended to a centralized algorithm for geometric 
random graphs. In Section V, we present the distributed LADA algorithm for wireless networks and 
analyze its performance. The C-LADA algorithm is treated in Section VI. Several important related 
works are discussed in Section Vn. Finally, conclusions are given in Section Vm. 
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II. Problem Formulation and Preliminaries 

A. Problem Formulation 

Consider a network represented by a connected graph G = (V, E), where the vertex set V contains n 
nodes and E is the edge set. Let vector x(0) = [a;i(0), ■ ■ ■ ,a;„(0)]'^ contain the initial values observed 
by the nodes, and Xave = ^ Z^ILi ^« denote the average. The goal is to compute Xave in a distributed and 
robust fashion. As we mentioned, such designs are basic building blocks for distributed and cooperative 
information processing in wireless networks. Let x(t) be the vector containing node values at the fth 
iteration. Without loss of generality, we consider the set of initial values x(0) G M"*"", and define the 
e-averaging time as 

Tave(e)= sup inf{t: ||x(t)-xavel||i <e||x(0)||i}4 (1) 
x(o)eM+'* 

where ||x||i = Y^ - \xi\ is the h norm^. 

We will mainly use the geometric random graph [10], [1 1] to model a wireless network in our analysis. 
In the geometric random graph G{n,r{n)), n nodes are uniformly and independently distributed on a 
unit square [0, 1]^, and r(n) is the common transmission range of all nodes. It is known that the choice 
of r(n) > Y^ ^^"^" is required to ensure the graph is connected with high probability (w.h.p.)^ [10], [11]. 

B. Markov Chain Preliminaries 

The averaging time of consensus algorithms evolving according to a stationary Markov chain is closely 
related to the chain's convergence time. In this section, we briefly review two metrics that characterize 
the convergence time of a Markov chain, i.e., the mixing time and the fill time. For e > 0, the e-mixing 
time of an irreducible and aperiodic Markov chain P with stationary distribution tt is defined in terms 
of the total variation distance as [7] 

r^ixCP, e) = supinf \t : ||P*(i, •) - tvWtv = J||P*(i, •) - 7r||i < e j = supinf {t : ||p(i) - 7r||i < 2e} , 
j I ^ J p(0) 

(2) 

"'For the more general case x(0) G R", the corresponding expression in (1) is \\x{t) — XavelHi < e||x(0) — min, a;i(0)l||i. 

'in the hterature of distributed consensus, the h norm ||x||2 = \xi\^ has also been used in measuring the averaging 

time [1], [5]. The two metrics are closely related. Define Tavc,2(e) ~ sup^^gjgjj+n inf {t : ||x(t) — Xavellh < e||x(0)||2}. It is 
not difficult to show that when e = (i), then rave,2(e) = O (rave(e))- 

*with probability approaching 1 as n — » oo 



DRAFT 



SUBMITTED TO IEEE TRANS. INFORM. THEORY. 



6 



where p(t) is the probability distribution of the chain at time t, and P*(i, •) is the ith row of the t-step 
transition matrix (i.e., p(i) given p(0) = ej'^). The second equality is due to the convexity of the li 
norm. 

Another related metric, known as the fill time [12] (or the separate time [13]), is defined for < c < 1 

as 

rfiii(P,c) ^supinf{t :P*(i,-) > (l-c)7r}. (3) 

i 

For certain Markov chains, it is (relatively) easier to obtain an estimate for Tfiu than for Tmix- The 
following lemma comes handy in estabUshing an upper bound for the mixing time in terms of Tfiu, and 
will be used in our analysis. 

Lemma 2.1: For any irreducible and aperiodic Markov chain P, 

rmix(P, e) < [log(e-i)/log(c-^) + 1] TfiiKP, c). (4) 
Proof: The lemma follows directly from a well-known result in Markov chain theory (see the 
fundamental theorem in Section 3.3 of [14]). It states that for a stationary Markov chain P on a finite state 
space with a stationary distribution tt, if there exists a constant < c < 1 such that P{i,j) > (1 — c)7rj 
for all then the distribution of the chain at time t can be expressed as a mixture of the stationary 
distribution and another arbitrary distribution r{t) as 

p(i) = (l-c*)7r + c*r(t). (5) 

Thus 

\\p(t) - 7r||i = c*||7r - r(t)||i < 2c*. (6) 

Now, for any irreducible and aperiodic chain, by (3), we have > (1 — c)7Tj for any i,j when 

r > Tfiii(P,c). It follows from the above that for any starting distribution, 

^||p(t) - Trill <c^*/^""(^' (7) 

and the desired result follows immediately by equating the right hand side of (7) with e. ■ 

^e, is the vector with 1 at the ith position and elsewhere. 
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III. Fast Distributed Consensus Via Lifting Markov Chains 

The idea of the Markov chain hfting was first investigated in [8], [9] to accelerate convergence. A 
hfted chain is constructed by creating multiple replica states corresponding to each state in the original 
chain, such that the transition probabihties and stationary probabihties of the new chain conform to those 
of the original chain. Formally, for a given Markov chain P defined on state space V with stationary 
probabihties tt, a chain P defined on state space V with stationary probabihty tt is a lifted chain of P 
if there is a mapping f : V ^ V such that 

T^v= J2 ^^eF (8) 

vef-'{v) 

and 

Puv= Yl ^u,vev. (9) 

Moreover, P is called a collapsed chain of P. 

Given the close relationship between Markov chains and distributed consensus algorithms, it is natural 
to ask whether the nonreversible chain-lifting technique could be used to speed up distributed consensus 
in wireless networks. We answer the above question in two steps. First, we show that by allowing each 
node to maintain multiple values, mimicking the multiple lifted states from a single state, a nonreversible 
chain on a lifted state space can be simulated*^. In this section, we provide two pseudo-algorithms to 
illustrate this idea. With such pseudo-algorithms in place, the second step is to explicitly construct fast- 
mixing non-reversible chains that result in improved averaging times compared with existing algorithms. 
The latter part will be treated in Section IV and V, where we provide detailed algorithms for both grid 
networks as well as general wireless networks modeled by geometric random graphs. 

Consider a wireless network modeled as G{y,E) with |F| = n. A procedure that realizes averaging 
through chain-lifting is given in Pseudo-algorithm 1, where P is some G-conformant ergodic chain on 
V with a uniform stationary distribution. 

Lemma 3.1: Using Pseudo-algorithm 1, x(t) Xavcl and the averaging time Tavc(e) < T^nx(P, e/2)- 
Proof: Let p(t) be the distribution of P at time t, and tt the stationary distribution of P. As 
P is ergodic and the linear iteration in Pseudo-algorithm 1 is sum-preserving, it can be shown that 

^Although sometimes used interchangeably in related works, in this study it is better to differentiate between nodes (in a 
network) and states (in a Markov chain), since several states in the lifted chain correspond to a single node in a network. 
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Algorithm 1 Pseudo- Algorithm 1. 

1) Each node v eV maintains by copies of values yl, - ■ ■ ,2/^", the sum of which is initially set equal 

to Xv{0). Correspondingly, we obtain a new state space V and a mapping f : V ^ V with the 
understanding that {y[}i=i,... ,6„ can be alternatively represented as {yv}vef-'^{v)- 

2) At each time instant t, each node updates its values based on the values of its neighbors. Let 
the vector y contain the copies of values of all nodes, i.e., y = [yf , • • • j'yfy^]'^ with y„ = 
[ul^' ' ' ) yS"]^- The values are updated according to the linear iteration y{t + 1) = P'^y{t), where 
P is some ergodic chain on V lifted from P. 

3) At each time instant t, each node estimates the average value by summing up all its copies of 
values: Xy{t) = Ei'=i2/U0- 



y{t) nXave'J^^ and x{t) Xavel due to the Ufting property (8) and the uniform stationary distribution 
of P. Furthermore, we have y(t) = na;aveP(i)' ^^r t > rinix(P, e/2), 

\\^{t) -Xa,ye'i-\\l = 'Y\Xv{t) -XaYe\ = 'Y\'Yyi,-Xa,ye\ = 'Y\ ^ {yv{t) - T^vTlXgye)] 

vev vi^v 1=1 vev v£f--^(v) 

< ^ ^ - TTyUXayel = nXg^^e ^1 I^^^W ~ '^^'1 - ^^JaveC = e||x(0)||i, 

where the third equahty is by ttv = J2vef-^{v) = ^, Vf G V, the first inequahty is by the triangular 
inequality, and the last inequality is by the definition of mixing time in (2). ■ 

From the above discussion, we see that for a wireless network modeled as G = (V, E), as long as 
we can find a fast-mixing chain whose collapsed chain is G conformant and has a uniform stationary 
distribution on V, we automatically obtain a fast distributed averaging algorithm on G. The crux is then 
to design such Ufted chains which are typically nonreversible to ensure fast-mixing. While the fact that 
the collapsed Markov chain possesses a uniform stationary distribution facilitates distributed consensus, 
this does not preclude the possibility of achieving consensus by lifting chains with non-uniform stationary 
distributions. In fact, the non-uniformity of stationary distribution can be "smoothen out" by incorporating 
some auxiUary variables that asymptotically estimate the stationary distribution. Such a procedure allows 
us more flexibilities in finding a fast-mixing chain on a given graph. This idea is presented in Pseudo- 
algorithm 2, where P is some G-conformant ergodic chain on V. 

Lemma 3.2: a) Using Pseudo-algorithm 2, x(t) Xavel- 

b) Suppose for the collapsed chain P, there exists some constant > such that the stationary distri- 
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Algorithm 2 Pseudo-Algorithm 2. 

1) Each node v e V maintains pairs of values {y[,,Wy), I = 1, • • • 6^,, whose initial values satisfy 

YliVvi^) — Xv{0) and Yli'^vi^) — 1- Correspondingly, we obtain a new state space V and a 
mapping f :V ^V. 

2) Let the vector y contain the copies y^" for all u G F and ly = 1, • • • ,by, and similarly denote w. 
At each time instant, the values are updated with 



bution 7r„ > ^ for all v &V. Then Algorithm 2 has an averaging time rave(e) = O (lege ^rfiii(P,c) 



y(i + l) =P^y(0, 
w(t + l) = P^w(t), 



where P is some ergodic chain on V lifted from P. 



3) At each time instant, each node estimates the average value by 




for any constant < c < 1. 



Proof: a) Denote the stationary distribution of P by tt. By a similar argument as that of Lemma 
3.1, limt_^ooy(i) = nXscveT^ and limt_,.oo w(t) = nir. It follows that limt^oox(t) = Xavel- 



b) Let p{t) be the distribution of P at time t. For any e > and any constant < c < 1, Lemma 
2.1 says that there exists some time t = O ^loge'~^rfiii(P,c)^, such that for any t > t and any initial 
distribution p(0), 



- / X 11 e(l — c)c' 
pW-7r||i< ^ ^ ^ ■ 



Moreover, for t > Tfiii(P, c), we have for G V, 




(11) 
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Thus, for yt > T, 



< 



< 



|x(t) - Xavel||l = ^ \Xv(t) - 



(1 


-c)c' 




1 


(1 


-c)c' 




1 


(1 


-c)c' 




1 


(1 


-c)c' 



vev vef-^v) 



vev 



\yv{t) - nTTijXavel + ^1 ~ "^^fiilXave 



e(l - c)c' e(l - c)c' 



e||x(0)||i. 



Remark: It is clear that serves to estimate the scaling factor riTTv at each iteration. Alternatively, a 
pre-computation phase can be employed where each node v computes J2v&f-^{v) ^v- Then only the y 
values need to be communicated. 

In the above, we have proposed two pseudo-algorithms to illustrate the idea of distributed consensus 
through lifting Markov chains, leaving out the details of constructing fast-mixing Markov chains. In the 
following two sections, we present one efficient reahzation for each of these two pseudo-algorithms, on 
regular networks and geometric random networks, respectively. 



IV. LADA Algorithm On Grid 

In this section, we present a LADA algorithm on a A; x A; grid. This algorithm utiUzes the direction 
information (not the absolute geographic location) of neighbors to construct a fast-mixing Markov chain, 
and is a specific example of Pseudo- Algorithm 1 described in Section HI. While existing works typically 
assumes a torus structure to avoid edge effects and simplify analysis, we consider the grid structure which 
is a more realistic model for planar networks, and explicitly deal with the edge effects. This algorithm is 
then extended to a centralized algorithm for general wireless network as modeled by a geometric random 
graph. Our analysis directly addresses the standard definition of mixing time in (2). Besides interest in 
its own right, results in this section will also faciUtate our analysis in the following sections. 
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Fig. 1. Node neighbors and values in the grid 
A. Algorithm 

Consider a k x k grid. For each node i, denote its east, north, west and south neighbor (if exists) 
respectively by N^,Nl, Nf and Nf, as shown in Fig. 1. Each node i maintains four values indexed 
according to the four directions counter-clockwise (see Fig. 1). The east, north, west and south value of 
node i, denoted respectively by y], yf and yf, are initialized to 

y^(0) = ^, Z = 0,---,3. (12) 

At each time instant t, the east value of node i is updated with 

+ 1) = (i - ^) y%^^) + ^ (y^i^ w + yN^t)) ■ (13) 

That is, the east value of i is updated by a weighted sum of the previous values of its west neighbor, 
with the majority (1 — ^) coming from the east value, and a fraction of 2^ coming from the north value 
as well as the south value. If Hs a west border node (i.e., one without a west neighbor), then the west, 
north and south value of itself are used as substitutes: 

y^it + 1) = (1 - yf{t) + ^ {y}{t) + yf{t)) . (14) 

The above discussion is illustrated in Fig. 2. Intuitively the west value is "bounced back" when it reaches 
the west boundary and becomes the east value. As we will see, this is a natural procedure on the grid 
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Fig. 2. Updating of east values for a normal node (right) and a west boundary node (left) 



Structure to ensure that the iteration evolves according to a doubly stochastic matrix which is desirable for 
averaging. Moreover, the fact that the information continues to propagate when it reaches the boundary is 
essential for the associated chain to mix rapidly. Similarly, the north value of i is updated by a weighted 
sum of the previous values of its south neighbor, with the majority coming from the north value, and so 
on. Each node then calculates the average of its four values as an estimate for the global average: 

3 

xi{t + i) = J2yiit + ^)- (15) 

B. Analysis 

Assume nodes in the k x k grid are indexed by {x,y) G [0, k — I] x [0,k — 1], starting from the 
south-west corner. The nonreversible Markov chain P underlying the above algorithm is illustrated in 
Fig. 3. Each state s G 5 is represented by a triplet s = {x,y,l), with / G {E,W, N,S} denoting the 
specific state within a node in terms of its direction. The transition probabilities of P for an east node 
are as follows (similarly for / G {N,W, S}): 

P((x,y,E), (x + l,y,E)) = 1- i x<k-l (16) 

P((a;,y,E), (x,y,W)) = l-p x = k-l (17) 

P((a;,y,E), (x,y+l,N)) = P((x,y,E), (x,y-l,S)) = ^, < y < A; - 1 (18) 

P((x,y,E), (x,y,S)) = P((x,y,E), (x,y-l,S)) = ^, y = k-l (19) 

P ((x, y, E), (x, y + 1, N)) = P ((x, y, E), (x, y, N)) = ^, y = 0. (20) 

It can be verified that P is doubly stochastic, irreducible and aperiodic. Therefore, P has a uniform 
stationary distribution on its state space, and so does its collapsed chain. Consequently each Xi{t) —>■ 
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I 



Fig. 3. Nonreversible chiain used in the LADA algorithm on a grid: outgoing probabilities for the states of node i are depicted. 

Xa.ve by Lemma 3.1. Moreover, since the nonreversible random walk P most likely keeps its direction, 
occasionally makes a turn, and never turns back, it mixes substantially faster than a simple random walk 
(where the next node is chosen uniformly from the neighbors of the current node). Our main results on 
the mixing time of this chain, and the averaging time of the corresponding LADA algorithm are given 
below. 

Lemma 4.1: The e-mixing time of P is a) Tmix(P,e) = 0(/c log(e~^)), for any e > 0; 
b) Tmix(P,e) = Q{k), for a sufficiently small constant e. 

Proof: a) See Appendix A. The key is to show that Tgn = 0{k). The desired result then follows 
from Lemma 2.1. 

b) We are left to show that Tmix(P,e) = U,{k) for a constant e which is sufficiently small (less than 
2/32 in this case). For the random walk starting from sq € S, denote by st the state it visits at time t if 
it never makes a turn. Note that (l — is an increasing function in k, hence (l — ■^)'^ > | for /c > 2. 
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Thus we have for t < k. 



|P'(..,-) - ■ nil > |P'(»o.^,) - = I (l - i)' - (21) 

1113 



for < e < where the second inequality follows from (l — |)*> (l — ^)'^> \ > The result 
follows from the definition of mixing time in (2). ■ 

Theorem 4.1: For the LADA algorithm on a x /c grid, a) Tave(e) = 0(/c log(e~^)) for any e > 0; 
b) TaveCe) = Q{k) for a sufficiently small constant e. 

Proof: a) Follows from Lemma 3.1 and Lemma 4.1 a). 

b) Note that the proof of Lemma 4.1 b) also implies that for A; > 3, for any initial state sq G S, when 
t < k, there is at least one state s E S with which P*(so) s) > (l — > Suppose state s is some 
state belonging to some node v. Thus foTt<k(k> 3) 

|^.W-^ave| = | P*(^0,s)-p|-||x(0)||i>|P*(so,s)-^|-||x(0)||i>^||x(0)||i, (23) 

i.e, node v has not reached an average estimate in this scenario (when < e < ^). ■ 

C. A Centralized Grid-based Algorithm for Wireless Networks 

The regular grid structure considered above does appear in some applications, and often serves as a 
first step towards modehng a realistic network. In this section, we explore a celebrated model for wireless 
networks, geometric random graphs, and present a centralized algorithm which achieves an e-averaging 
time of 0(r^^ log(e~^)) on G{n,r). The algorithm relies on a central controller to perform tessellation 
and clustering, and simulates the LADA algorithm on the grid proposed above on the resultant 2-d grid. 
This is a common approach in literature (e.g., [10]), where the main purpose is to explore the best 
achievable performance in wireless networks, with implementation details ignored. 

Assume that the unit area is tesselated into fc^ = '"^"'^ squares (clusters). By this tessellation, a node 
in a given cluster is adjacent to all nodes in the four edge-neighboring clusters. Denote the number of 
nodes in a given cluster m by rim- Then for a geometric random graph rim > 1 for all m w.h.p. [10]. One 
node in each cluster is selected as a cluster-head. Denote the index of the cluster where node i Hes by Q. 
For each cluster m, denote its east, north, west and south neighboring cluster (if exists) respectively by 
N^,^,N^^, N'^j^ and A^;^^. Every cluster-head maintains four values corresponding to the four directions from 
east to south clockwise, denoted respectively by y^, yj^, and yf^ for cluster m. In the initialization 
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Stage, every node transmits its value to the cluster-head. The cluster-head of cluster m computes the sum 
of the values within the cluster and initiahzes all its four values to 

ym(0) = I E ^^(0)' Z = 0,---,3. (24) 

Ci=m 

At each time instant t, the cluster-heads of neighboring clusters communicate and update their values 
following exactly the same rules as the LADA algorithm on the grid. Each cluster-head then calculates 
the average of its four values as an estimate for the global average, and broadcasts this estimate to its 
members, so that every node i obtains 



Xi{t + 1) = —Y,ykii + '^)- (25) 

^ 1=0 

Theorem 4.2: The centralized algorithm has an e-averaging time Tave(e) = 0(r~^ log(e~^)) on the 



geometric random graph G(n, r) with common transmission radius r > y " w.h.p. Moreover, for a 
sufficiently small constant e, Tave(e) = 6(r~^). 

Proof: We can appeal to uniform convergence in the law of large numbers using Vapiuk-Chervonenkis 
theory as in [10] to bound the number of nodes in each cluster as follows: 

Pr( max \— - ^\ < e(n)] > 1 - d(n) (26) 
\l<m<k^ n J 

when n > max{j^ log log j^}. This is satisfied if we choose e(n) = 5{n) = IM^. Thus 

we have for all m, rim > — 4 log n = ^ — 4 log n, which is at least 1 for sufficiently large n if 

r > y'^^^^^- In this case, we have that ^ < rim < ^ for all m for some constants ci , C2 > w.h.p. 

By Lemma 4.1 a), for any e > 0, there exists some r = Tinix(P, ^) = 0{r~^ log(e~^)) such that for 

all t > T, 

m=l /=0 m=l 1=0 

< e||x(0)||i, 

where the last inequaUty follows a similar argument as in the proof of Lemma 3.1. 

To prove the latter part of the theorem, note that ||x(t) — Xavcl||i > C2 Ylm=i I SiLo^mlO ~ -^^fr^l- 
The rest follows a similar argument as in the proof of Theorem 4. 1 b). ■ 

In large dynamic wireless networks, it is often impossible to have a central controller that maintains 
a global coordinate system and clusters the nodes accordingly. In the following sections, we investigate 
some more practical algorithms, which can be applied to wireless networks with no central controller or 
global knowledge available to nodes. 
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V. Distributed LADA Algorithm for Wireless Networks 



In practice, purely distributed algorithms requiring no central coordination are typically preferred. In this 
section, we propose a fully distributed LADA algorithm for wireless networks, which is an instantiation 
of Pseudo-Algorithm 2 in Section m. As we mentioned, while our analysis is conducted on G{n, r(n)), 
our design can generally be apphed to any network topology. 

A. Neighbor Classification 

As the LADA algorithm on a grid, LADA for general wireless networks utilizes coarse location 
information of neighbors to construct fast-mixing nonreversible chains. Due to irregularity of node 
locations, a neighbor classification procedure is needed. Specifically, a neighbor j of node i is said 
to be a Type-Z neighbor of i, denoted as j & M-, if 



where Xi denotes the geometric location of node i (whose accurate information is not required). That is, 
each neighbor j of i belongs to one of the four regions each spanning 90 degrees, corresponding to east 
(0), north (1), west (2) and south (3). Note that if i G A/], then j e J\f-'^'^ We denote the number 
of type I neighbors for node ihy d\ = IM-] (except for boundary cases discussed below). 

In literature, wireless networks are often modeled on a unit torus or sphere to avoid the edge effects in 
performance analysis [5], [10]. In our study, we explicitly deal with the edge effects by considering the 
following modification, as illustrated in Fig. 4. A boundary node is a node within distance r from one of 
the boundaries, e.g., node i in Fig. 4. For a boundary node i, we create mirror images of its neighbors 
with respect to the boundary. If a neighbor j has an image located within the transmission range of i, 
node j (besides its original role) is considered as a virtual neighbor of i, whose direction is determined 
by the image's location with respect to the location of i. For example, in Fig. 4, node j is both a north 
and a virtual east neighbor of i, and node z is a virtual east neighbor of itself. Specifically, we use 
to denote the set of virtual east neighbors of an east boundary node i, and use A/^'^ to denote the set 
of virtual east neighbors of a north or south boundary node i. Similarly, Af^ denotes the set of virtual 
north neighbors of a north boundary node i, and M.I denotes that of an east or west boundary node, 
and so on for virtual west and south neighbors. Informally, ~ is used for the case the direction of the 
virtual neighbors and the boundary "match", while ^ is used for the "mismatch" scenarios. As we will 
see, they play different roles in the LADA algorithm. For example, in Fig. 4, we have i,j,k G Af-', and 




(27) 



Z G A/^^. It can be shown that if z G A/], then j G A/|, while if i G A/], then j G A/^' 



■1+2 (mod 4) 



. For a 
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Fig. 4. Illustration of neighbor classification and virtual neighbors for boundary nodes. Note that for an east boundary node i, 
there can only be virtual east neighbors of the first category k € Nf), and virtual north and south neighbors of the second 
category (I € Sfi) 



boundary node i, is instead defined as the total number of physical and virtual neighbors in direction 
I, i.e., dl = \Af-\ + I A/"/ 1 + \Afl\- With this modification, every type-l neighborhood has an effective area 
hence is roughly the same for all i and I. We also expect that as n increases, the fluctuation in d\ 
diminishes. This is summarized in the following lemma, which will be used in our subsequent analysis. 
Lemma 5.1: With high probability, the number of type I neighbors of i satisfies^ 

6(nr^) if r>^ 



d) 



(l±0(r)) if r = J^((i^)') 



16 log n 
nn 

(28) 



4 

Proof: We can appeal to the Vapnik-Chervonenkis theory as in [10] to bound the number of nodes 
in each cluster as follows: 

„ , ,4 7rr^ , 4 log n ^ 4 log n 

Prlsup P — < —} > 1 —. (29) 

i^l n 4: n n 

Hence, we have \d[ — | < 41ogn with probability at least 1 — li^Ii for all node i and direction /. 
Therefore, if r > ^i^, we have dl = ^{l±0 {^)) = e{nr^). If r = n (^(^) we 

have4 = ^ (^l±0(^(i^fi)'jj =^(l±0(r)). ■ 
'The stronger result regarding r = f2 ^(^^|-^) is required for the LADA-U algorithm presented in Appendix C. 
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B. Algorithm 

The LADA algorithm for general wireless networks works as follows. Each node i holds four pairs 
of values (y',tt;'), / = 0, • • • ,3 corresponding to the four directions counter-clockwise: east, north, west 
and south. The values are initialized with 



1 

4' 



/ = 0,- 



,3. 



At time t, each node i broadcasts its four values. In turn, it updates its east value y-* with 



(30) 



(31) 



where p = 6(r) is assumed. This is illustrated in Fig. 5. That is, the east value of node i is updated by 
a sum contributed by all its west neighbors j G J\ff; each contribution is a weighted sum of the values 
of node j in the last slot, with the major portion coming from the east value, and a fraction of 
coming from the north as well as the south value. 

As in the grid case, boundary nodes must be treated specially. Let us consider two specific cases: 
1) If i is a west boundary node (as shown in Fig. 6), then we must include an additional term 



5 L 



l-p)y^,{t) + ^-{y]{t) + v%t)) 



(32) 
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in (31), i.e. values from both physical and virtual west neighbors (of the first category) are used. 
Moreover, for the virtual west neighbors, the west rather than east values are used. This is similar 
to the grid case, where the west values are bounced back and become east values when they reach 
the west boundary, so that the information continues to propagate. The factor ^ rather than ^ is 
adopted here to ensure the outgoing probabilities of each state of each node j G sum to 1 . 
2) If i is a north or south boundary node (as shown in Fig. 7), however, the sum in (31) is replaced 
with 

E ^ [(1 - P)yj(t) + 1 {y]it) + , (33) 

i.e., the east, north and south values of both physical and virtual west neighbors (of the second 
category) are used. Note that are meant only for compensating the loss of neighbors for north or 
south boundary nodes, so unlike the previous case, their east or west values continue to propagate 
in the usual direction. 

If z is both a west and north (or south) boundary node, the above two cases should be combined. The 
purpose of introducing virtual neighbors described above is to ensure the approximate regularity of the 
underlying graph of the associated chain, so that the randomized effect is evenly spread out over the 

network. The north, west and south values, as well as the corresponding w values are updated in the 
same fashion. Node i computes its estimate of x^ve with 

The detailed algorithm is given in Algorithm 3'^. 

We remark that even the exact knowledge of directions is not critical for the LADA algorithm. For 
example, if a neighbor j of node i is roughly on the border of two regions, it is fine to categorize j to 
either region, as long as j categorizes i correspondingly (i.e., i G A/j^^ (mod4) ^ ^ j^iy 

C. Analysis 

Denote y = [yq^, yf y2^, y^V , with y/ = [y\,y2j - ■ ■ , similarly denote w. The above iteration 

can be written as y(t + 1) = Pfy(i) and w(t + 1) = Pfw(t). Using the fact that if i G TVjlJA/j, 
then j G A/^^^^ *^™°'^ U-^'^^ '^^^ if i G Mj, then j G A/"/, it can be shown that each row in 

'"We do not explicitly differentiate between the non-boundary and boundary cases, since the corresponding terms are 
automatically zero for non-boundary nodes. 



x,{t+i) = ^ir' Z (34) 



DRAFT 



SUBMITTED TO IEEE TRANS. INFORM. THEORY. 



20 



(™) -i.... 

•^^^ 

West 
Boundary 

Fig. 6. Update of east value of a west boundary node i: west value of virtual west neighbor j £ Af^ is used 



Pi (i.e., each column in PJ) sums to 1, hence Pi is a stochastic matrix (see Fig. 8 for an illustration). 
On a finite connected 2-d network, the formed chain Pi is irreducible and aperiodic by construction. 
Since the incoming probabilities of a state do not sum to 1 (see Eq. (31) and Fig. 5)^\ Pi is not doubly 
stochastic and does not have a uniform stationary distribution. The LADA algorithm for general wireless 
networks is a special case of the Pseudo-Algorithm 2 in Section III, and it converges to the average of 
node values by Lemma 3.2 a). In the rest of this section, we analyze the performance of LADA algorithm 
on geometric random graphs. 

Lemma 5.2: On the geometric random graph G{n,r) with r = 17 ^y''^^"^^ ' ^^E^ probability, 
the Markov chain Pi constructed in the LADA algorithm has an approximately uniform stationary 
distribution, i.e., for any s G 5, tt{s) = @ and Tfiu(Pi,c) = 0(r^^) for some constant < c < 1. 

The proof is given in Appendix B. Essentially, we first consider the expected location of the random 
walk Pi (with respect to the node distribution), which is shown to evolve according to the random walk 
P on a k X k grid with k = 6(r~^) when p = Q{r). Thus the expected location of Pi can be anywhere 

"Due to irregularity of the network, all west neighbors of a node don't have exactly the same number of east neighbors. 
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Fig. 7. Update of east value of a north boundary node i: east value of virtual west neighbor j G //^ is used 



Algorithm 3 LADA Algorithm 



for i = 1 to n do 

yKO)^x,(0), u;^(0)^l, / = 0,1,2,3 
end for 

p <^ §, t <S= 

while ||x(t) — Xavel||i > e do 
for z = 1 to n do 
for / = to 3 do 



ylit + 1) <= 



+ 1) 



end for 



+ 



+ 



Xi{t + l) 

end for 

t ^ t + 1 
end while 
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North states of 
north neighbors 



East, north and 
south states of 
west neighbors 



'-"id' -'''® 




East states of 
east neighbors 



South states of 
south neighbors 



Fig. 8. The Markov chain used in LADA: combined outgoing probabilities (solid lines) and combined incoming probabilities 
(dotted line) for the east state of node i are depicted 

on the grid in 0{k) steps (see Section IV). Then, we take the random node location into account and 
further show that when n ^ oo, the exact location of the random walk Pi can be anywhere in the 
network in 0{r^^) steps. 



Theorem 5.1: On the geometric random graph G{n,r) with r = 17 -^f^ j, the LADA algorithm 
has an e-averaging time Tave(e) = 0{r^^ log(e^^)) with high probability. 



Proof: Since when r = ^1 yy J > Markov chain Pi constructed in the LADA algorithm 
has an approximately uniform stationary distribution from Lemma 5.2, so does its collapsed chain. Thus 
Lemma 3.2 b) can be invoked to show that ravc(e) = O (rfiii(Pi, c) log(e"i)) = 0(r"i log(e"i)). ■ 



We have also explored a variant of the LADA algorithm, called LADA-U , which is a realization 
of Pseudo- Algorithm 1. The nonreversible chain is carefully designed to ensure a uniform stationary 
distribution (accounting for the suffix "U"), by allowing transitions between the east and the west, as 
well as between the north and south state for each node. It can be shown that LADA-U can achieve 
the same scaling law in averaging time as LADA, but requiring a transmission range larger than the 
minimum connectivity requirement, mainly due to the induced diffusive behavior. In particular, a sufficient 
condition for the same scaling law as LADA to hold is r = 17 ^^^^^^ ■ The LADA-U algorithm and 
its performance analysis are summarized in Appendix C for possible interest of the reader. 
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D. Tfiii Optimality of LADA Algorithm 

To conclude this section, we would like to discuss the following question: what is the optimal 
performance of distributed consensus through lifting Markov chains on a geometric random graph, and 
how close LADA performs to the optimum? A straightforward lower bound of the averaging time of this 
class of algorithms would be given by the diameter of the graph, hence rave(e) = r2(r~^). Therefore, 
for a constant e, LADA algorithm is optimal in the e-averaging time. For e = 0(l/n), it is not known 
whether the lower bound Vl{r^^) can be further tightened, and whether LADA achieves the optimal 
e-averaging time in scaling law. Nevertheless, we provide a partial answer to the question by showing 
that the constructed chain attains the optimal scaling law of rfiu(P, c) for a constant c G (0, 1), among 
all chains lifted from one with an approximately uniform stationary distribution on G{n,r). For our 
analysis, we first introduce two invariants of a Markov chain, the conductance and the resistance. The 
conductance measures the chance of a random walk leaving a set after a single step, and is defined for 
the corresponding chain P as [15] 

$(P) = min '"fl^ (35) 

^ ^ Scv,o<7r{S)<nr{S)Tr{S) 

where S is the complement of S in V, Q{A, B) = X^jg^ Sjefi ^ ~ '5(e) = 

Qij = T^iPij is often interpreted as the capacity of the edge in combinatorial research. The resistance is 

defined in terms of multi-commodity flows. A flow^^ in the underlying graph G(P) of P is a function 

f : T ^ M+ which satisfies 

f{'j) = Tr{u)Tr{v) yu,vEV,u^v (36) 

7er„„ 

where Fuv is the set of all simple directed paths from to f in G(P) and F = IJu^i; ^uv- The congestion 
parameter R{f ) of a flow / is defined as 

The resistance of the chain P is defined as the minimum value of R{f) over all flows, 

R{P) = mfR{f). (38) 

It has been shown that the resistance of an ergodic reversible Markov chain P satisfies R(P) < 
16rinix(P, 1/8) [15]. This result does not readily apply to nonreversible chains. Instead, a similar result 
exists for Tgu, as given below. 

'^An alternative and equivalent definition of a flow as a function on the edges of graphs can be found in [16]. 
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Lemma 5.3: For any irreducible and aperiodic Markov chain P, the resistance satisfies 

?fiii(P,c)>:^. (39) 
Proof: Let t = rfiii(P,c). Let F^J denote the set of all (not necessarily simple) paths of length 
exactly t from uio v 'm\h& underlying graph G(P). F^^ is nonempty by the definition of Tgu. For each 
7 G Tul, let p(7) denote the probability that the Markov chain, starting in state u, makes the sequence 
of transitions defined in 7, thus X^^gp(t) ^(7) = P*{u,v). For each u,v and 7 G Tul, set 

= '^'"'f'""^' (40) 

F^yu, V) 

and set 7(7) = for all other paths. Thus, X^^gp(t) 7(7) = 7r{u)TT{v). Now, by removing cycles on all 
paths, we can obtain a flow /' (consisting of simple paths) from / without increasing the throughput on 
any edge. The flow routed by /' through e is 

fie)^ E /'wsE E ^^$7^s^E E ™. 

where the second inequaUty follows from the defiiution of Tgu. The final double sum in (41) is precisely 
the probabihty that the stationary process traverses the oriented edge e within t steps, which is at most 
tQ{e). It then follows 

i?(/0 = ma^^<-^. (42) 
e Q[e) 1 - c 



Lemma 5.4: For the geometric random graph G{n,r) with r = 17 ^^^J' '^'^^ resistance of any 
G-conformant Markov chain with -it{v) = @ (^), Vv G y satisfies the following with high probability: 
a) the conductance $(P) = 0{r), and b) the resistance R(P) = il(r~^). 

Proof: Consider dividing the square with a line parallel to one of its sides into two halves S and 
S such that Tr{S) > 1/4 and 7r(5) > 1/4, as illustrated in Fig. 9. Note that such a line always exists 
and needs not to be at the center of the square. A node in S must lie in the shadowed region to have a 
neighbor in S. For any such node i, YljeS — ^- ^PPlyii^g th® Chemoff bound [17], it can be shown 



that when r = Q ^^^j , the number of nodes in the shadowed area is upper bounded by 2rn w.h.p. 
Therefore, we have 

^ ' 7r{S)7r{S) - 0.25-0.25 ^ ^ ^ 

i.e., <I>(P) = 0(r) w.h.p. By the the max-flow min-cut theorem [15], [18], the resistance R is related to 
the conductance ^ as R > ^, thus we have R(P) = 0(r~^) w.h.p. ■ 
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Fig. 9. Upper bound for the conductance of a Markov chain on G(n, r) 

Note that the resistance cannot be reduced by hfting [9]. Combining this fact with Lemma 5.3 and 
Lemma 5.4 yields the following. 

Theorem 5.2: Consider a chain P on the geometric random graph G(n, r) = {V, E) with r = 



yy^^^j and 7r(t;) = (^), Vt; G V. For any chain P lifted from P and any constant < c < 1, 
rfiii(P,c) = ri(r-i) with high probability. 

The above shows that the constructed chain in LADA is optimal in the scaling law for the mixing 
parameter Tgu for any chains Hfted from one with an approximately uniform stationary distribution on 



In Section IV-C, we have presented a centralized algorithm, where the linear iteration is performed on 
the 2-d grid obtained by tessellating the geometric random graph. Only the cluster-heads are involved in 
the message exchange. Therefore, compared to the purely distributed LADA algorithm, the centraUzed 
algorithm offers an additional gain in terms of the message complexity, which translates directly into 
power savings for sensor nodes. However, as we have mentioned previously, the assumption of a central 
controller with knowledge of global coordinates might be unrealistic. This motivates us to study a more 
general cluster-based LADA (C-LADA) algorithm which alleviates such requirements, and still reaps the 
benefit of reduced message complexity. 

A. C-LADA Algorithm 

The idea of C-LADA can be described as follows. The nodes are first clustered using a distributed 
clustering algorithm given in Appendix D, where no global coordinate information is required. Two 




G(n,r). 



VI. Cluster-based LADA Algorithm for Wireless Networks 
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Fig. 10. Illustration of the induced graph from distributed clustering of a realization of G(300, r(300)). Nodes are indicated with 
small dots, cluster-heads are indicated with small triangles, cluster adjacency are indicated with solid lines, and the transmission 
range (not clusters) of cluster-heads are indicated with dashed circles. 



clusters are considered adjacent (or neighbors) if there is a direct link joining them. Assume that through 
some local information exchange, a cluster-head knows all its neighboring clusters. In the case that two 
clusters are joined by more than one hnks, we assume that the cluster-heads of both clusters agree on one 
single such link being activated. The end nodes of active links are called gateway nodes. The induced 
graph G from clustering is a graph with the vertex set consisting of all cluster-heads and the edge set 
obtained by joining the cluster-heads of neighboring clusters. In Fig. 10, we illustrate the induced graph 
as a result of applying our distributed clustering algorithm to a realization of G(300, r(300)), where 

/„ \ / 2 log n 

As can be seen, the induced graph typically has an arbitrary topology. Neighbor classification on the 
induced graph is based on the relative location of the cluster-heads, according to a similar rule as described 
in Section V-A. Let AA^ denote the set of type-/ neighboring clusters (including virtual neighbors) for 
cluster m, and = \M\^. It can be shown that dj^j > 1 for any m and / w.h.p.. Let Cj be the index of 
the cluster node i belongs to, and Um be the number of nodes in cluster m. It is convenient to consider 
another relevant graph G = {V, E) constructed from the original network graph G = {V, E) as follows: 
for any i,j G V , € E if and only if Cj and Cj are neighbors. Moreover, j is considered as a type-Z 
neighbor of i if and only if Cj is a type-/ neighboring cluster of Cj. It is easy to see that nodes in the 
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same cluster have the same set of type-/ neighbors, and hence they would follow the same updating rule 
if the LADA algorithm is applied. Furthermore, nodes in the same cluster would have the same values at 
any time, if their initial values are the same. Note that the initial values in a given cluster can be made 
equal through a simple averaging at the cluster-head. The above allows updating a cluster as a whole at 
the cluster-head, saving the transmissions of individual nodes. For any cluster m, let = J2m'eX' 
be the total number of nodes in the type-Z neighboring clusters of m, which is equal to the number of 
type-Z neighbors of any node in cluster m in G. 

Every cluster-head maintains four pairs of values / = 0, • • • ,3, initialized with yln{0) = 

I^c =m^«(0)/(^"^ni)' ^^'^ ^L(O) = 1/4, Z = 0, ••• ,3. At time t, the gateways nodes of neighboring 
clusters exchange values and forward the received values to the cluster-heads. The cluster-head of cluster 
m updates its east y value according to 

and similarly for other y values and w values, and broadcasts them to its members. Every node computes 
the estimate of the average with Xi(t) = ^X]f=o Z/Ci (*)) / (Sf=o ""^Cj (*)) • 

It can be verified that, the above C-LADA algorithm essentially realizes the LADA algorithm on graph 
G with the above neighbor classification rule; for any node in cluster m, the update rule in (44) is 
equivalent to the update rule in (31). It follows that x(t) converges to Xavcl as f — *■ oo, and C-LADA 
also achieves an e-averaging time of 0(r~^ log(e~^)) on geometric random graphs. 

B. Message Complexity 

Finally, we demonstrate that C-LADA considerably reduces the message complexity, and hence the 
energy consumption. For LADA, each node must broadcast its values during each iteration, hence the 
number of messages transmitted in each iteration is B(n). For C-LADA, there are three types of messages: 
transmissions between gateway nodes, transmissions from gateway nodes to cluster-heads and broadcasts 
by cluster-heads. Thus, the number of messages transmitted in each iteration is on the same order as the 
number of gateway nodes, which is between Kdmin and Kdm^x, where K is the number of clusters, and 
dmin and dmax are respectively the maximum and the maximum number of neighboring clusters in the 
network. 

Lemma 6.1: Using the Distributed Clustering Algorithm in Appendix D, the number of neighboring 
clusters for any cluster m satisfies 4 < o?^. < 48, and the number of clusters satisfies 7r~^r~^ < K < 
2r-2. 
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Proof: The lower bound dm > 4 follows from > 1 for any m and /. Note that the cluster-heads 
are at least at a distance r from each other (see Appendix D). Hence, the circles with the cluster-heads as 
the centers and radius 0.5r are non-overlapping. Note also that, for a cluster m, the cluster-heads of all 
its neighboring clusters must he within distance 3r from the cluster-head of m. Within the neighborhood 
of radius 3.5r of a cluster-head, there are no more than non-overlapping circles of radius 0.5r. 

This means that the number of neighboring clusters is upper bounded by 48. 

Consider the tessellation of the unit square into squares of side Thus, every such square contains 
at most one cluster-head, so there are at most 2r~^ clusters. On the other hand, in order to cover the 
whole unit square, there must be at least vr^^r^^ clusters. ■ 

The theorem below on the message complexity follows immediately. 

Theorem 6.1: The e-message complexity, defined as the total number of messages transmitted in the 
network to achieve e-accuracy, is 0{nr~^ log(e~^)) for the LADA algorithm, and 0{r~^ log(e~^)) for the 
C-LADA algorithm with high probabiUty in the geometric random graph G{n, r) with r = @{^/\ogn/n). 

As a side note, cluster-based algorithms haven also been designed based on reversible chains [19] to 
reduce the message complexity. 

VII. Related Works 

In this section, we review several relevant works reflecting recent development on distributed consensus. 
The reader is referred to [2] for a systematic treatment of distributed computation. Xiao and Boyd [1] 
derived necessary and sufficient conditions for the deterministic weight matrix W such that the linear 
iteration x(t + 1) = Wx(i) asymptotically computes Xavel as t cx). They formulated the fastest linear 
averaging problem as a semi-definite program, which is convex when W is restricted to be symmetric. 
Finding the optimal symmetric W with non-negative weights is closely tied to the problem of finding 
the fastest mixing reversible Markov chain on the graph. Recently, another class of distributed consensus 
algorithms, the gossip algorithms have received much interest [20], [21], [5]. Under the gossip constraint, 
a node can communicate with at most one node at a time. In particular, the randomized gossip algorithm 
studied by Boyd et al. [5] realizes distributed averaging through asynchronous pairwise relaxation. On 
a geometric random graph with transmission radius 6 (^^\ogn/n^, the time complexity and message 
complexity to reach e-accuracy are respectively 6 (n log e^^/ log n) and 6 (n^ log e^^/ logn). A recent 
work by Moalleimi and Roy [6] proposed consensus propagation, a special form of Gaussian belief 
propagation, as an alternative for distributed averaging. By avoiding passing information back to where 
it is received, consensus propagation suppresses to some extent the diffusive nature of a reversible 
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random walk. However, the gain of consensus propagation in time complexity over gossip algorithms 
quickly diminishes as the average node degrees grow, in which case the diffusive behavior is not effec- 
tively reduced. In comparison, our LADA algorithms realize distributed consensus with time complexity 
O (ri°'^ log e~ ^ / \/log n) and message complexity as low as O (ra^'^loge~^/(logra)^'^) on a connected 
geometric random graph. 

While the above works studied either synchronous or asynchronous parallel algorithms, the work 
by Savas et al. [22] explored distributed computation of decomposable functions through sequential 
algorithms, where a node does not transmit messages until it is activated by another node. They proposed 
two algorithms, SIMPLE- WALK and COALESCENT, with which the transmission tokens follow a simple 
and a coalescing random walk respectively. Both algorithms provide gain in message complexity at a 
cost of time complexity compared with gossip algorithms. The geographic gossip algorithm proposed by 
Dimakis et al. [23] is another work along this line. Motivated by the observation that standard gossip 
algorithms can lead to a significant energy waste by repeatedly circulating redundant information, the 
geographic gossip algorithm reduces the message complexity by greedy geographic routing, for which an 
overlay network is built so that every pair of nodes can communicate. Note that such a modification entails 
the absolute location (coordinates) knowledge of the node itself and its neighbors A notable recent 
work by Benezit et al. [24] further improves the geographic gossip algorithm by allowing averaging along 
routing paths. Under the box-greedy routing scheme they propose, further reduction in time and message 
complexity is achieved. Both time and message complexity of the algorithms in [24] are essentially 
r2(nloge~^) on geometric random graphs. In comparison, the class of LADA algorithms we propose 
reduce time complexity by a factor of O [^\/n log n) and increase message complexity by a factor of 
O (-v/n/(logn)^-^) to 0{^Jn/{\.ogn)), and does not require global coordination. The optimal tradeoff 
between time and message complexity of distributed consensus warrants further study. 

The independent work by Jung and Shah [25] also explored nonreversible chains for fast distributed 
consensus. However, our scheme is considerably different from theirs. Their algorithm adopts the non- 
reversible hfting of an existing Markov chain as proposed in [9], which is constructed from a multi- 
commodity flow of the chain with minimum congestion. For each path in the multi-commodity flow (at 
least one path between each ordered pair of nodes), a new replica node (state) is created for each internal 
node of the path. Therefore, the state space of the new chain is of a size up to n^. Moreover, to construct 
the chain each node in the network must have global knowledge of the network - in particular, the paths 

'^On the contrary, our algorithm only requires direction knowledge of neighbors. 
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in the optimal multi-commodity flow that pass through itself. On the other hand, the chain used in our 
algorithm is formed in a distributed fashion exploiting only local information and simple computation, 
and the size of the state space is linear in n. As a result, our algorithm is more robust to topology changes: 
when a node joins or leaves the network, only its neighbors need to update their local processing rules. 
Therefore, the class of LADA algorithms we propose is more suited for distributed implementation in 
dynamic large-scale networks. 

VIII. Conclusion 

We propose a class of Location-Aided Distributed Averaging (LADA) algorithms for grid networks 
and wireless networks, which achieve fast convergence via constructing nonreversible lifting of Markov 
chains. Our algorithms can realize an e-averaging time of 0{r~^ log(e~^)) for all transmission range r that 
guarantees network connectivity, a significant improvement over existing algorithms based on reversible 
chains. The cluster-based LADA (C-LADA) variant requires no central controller to perform clustering, 
while reaps the benefit of reduced message complexity. Our constructed chain attains the optimal scaling 
law in terms of an important mixing metric, the fill time [12], among all chains Ufted from one with an 
approximately uniform stationary distribution on geometric random graphs. 

Appendix 

A. Proof of Lemma 4.1 

We will show that by time t = 6k, the random walk starting from any state visits every state with 
probability at least ^ for some constant C > 0. The desired result then follows from Lemma 2. 1 . Recall 
that in Section IV. B., we denote each state s € 5 by a triplet s = (x, y, I). To facilitate the analysis, we 
define an auxiliary parameter z for a state s as follows: 

X l = E 

2k-x-l l = W 

(45) 

y Z = N 



z^< 



2k-y-l l = S. 

For example, the numbering for east and west states in a given row is illustrated in Fig. 1 1 . Due to the 
circular numbering, a horizonal movement of the random walk that keeps the direction (and bounces back 
at the boundary) can be written as (y, z) ^ {y, z + I (mod 2k)), and similarly for a vertical movement. 
Note that by defining the function 

g{z) = mm{z,2k - z - 1), (46) 
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y 




z 



X 



Fig. 11. Illustration of circular numbering of east and west states within a row 

we have g{z) = x when / e {E, W}, and g{z) = y when / G {N, S}. 

Without loss of generality, we assume that the chain starts from some horizontal state so = (xq, yo, ^o) 
with Iq G {E, W}. Let Ti, • • • , (1 < Ti < T2 < • • • ) be the times that the random walk makes a 
turn. Let st be the state the random walk visits at the tth step''^, and At be the number of turns made 
by the random walk up to time t. In the following, we consider two cases: (1) a target state s = (x, y, I) 
with I G {E,W}, i.e., a horizontal state, and (2) a target state with / G {N,S}, i.e., a vertical state, and 
show that at t = 6k, for both cases 



1) s is a horizontal state. In this case, we focus on At = 2 (so st is also a horizontal state), and 



Note that a horizontal state s is fully characterized by y and z (since x = g{z)). Thus, the state 
at time can be represented as {yQ,zo), as illustrated in Fig. 12. Now, consider the state at time 
t. First, observe that yt is determined only by the direction of the first turn at Ti, which may be 
towards north or south, as illustrated by the two states labeled with Ti in Fig. 12. If the turn is 
towards north, we have 



show that 



Pv{st = s}> Fv{st = s,At 




(47) 



g{yo + T2-n (mod 2k)); 



(48) 



if it is towards south, we have 



yt = g{2k -1-2/0+^2 



Ti (mod 2k)) = g{-yo + Ta - Ti - 1 (mod 2k)). 



(49) 



14- 



In our notation, in the tth step, the random walk goes from state St_i to St- 
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(g(yo+T2-Ti), 

-Zo-Ti+1+t-T2) 



(yo. zo) 



©- 



O 



W E 



N 



© 



(g(-yo+T2-Ti-i), 

-Zo-Ti+1+t-T2) 

Fig. 12. Illustration of states traversed till time t with two turns 



(g(yo+T2-Ti), 

Z0+T1 +t-T2) 



(g(-yo+T2-Ti-i), 

Z0+T1 +t-T2) 



Second, observe that zt is determined only by the direction of the second turn at T2, which may 
be the same as the one in which the random walk is moving at time Ti — 1, or the opposite. In 
the former case (the two east states at time T2 shown in Fig. 12), it can be shown (by observing 
the two periods [1, Ti — 1] and [T2, t] within which the random walk is traveUng horizontally) that 

zt = zo + Ti-l + it-T2 + l) (mod 2k) = zq + Ti - T2 + i (mod 2A;); (50) 

in the latter case (the two west states at time T2 shown in Fig. 12), we have 

zt = 2k-l-{zo + Ti~l) + {t-T2 + 1) (mod 2k) = -Zq - Ti - T2 + t + 1 (mod 2k). (51) 

Therefore, we have aXt = 6k, 

PT{st = s}>PT{st = s,At = 2} 

> Pi{g{yo + T2-Ti) = y {mod2k), -zq - Ti - T2 + t + I = z {mod 2k), At = 2} 

+ Pr{5(-yo + T2-T1-I) = y (mod 2k), -zo - Ti - T2 + t + 1 = z (mod 2k), At = 
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where the second inequality comes from picking two combinations of yt and zt out of the four 
possible combinations formed from (48) - (51). Assuming that g{i) = i (the case for g{i) = 2k—l—i 
can be similarly argued), and letting a = y — yo, b = t — Zq — z + 1 and c = y + yo + 1, we get 

Pr{st = s} > Pr{r2 - Ti = a (mod 2k), Ti+T2 = b (mod 2k), At = 2} 

+ Pr{T2 - Ti = c (mod 2k), Ti + = 6 (mod 2k), At = 2}. 

Note that T2 — Ti and Ti + T2 must have the same parity, so we need to consider two cases: if a 
and b have the same parity, then there exists at least a pair of (Ti, with 1 < Ti < T2 < t (e.g., 

Ti = ^ - 1 (mod 2k) + 1 and r2 = ^ - 1 (mod 2k) + 2fc + 1) such that T2-Ti = a (mod 2k) 
and Ti + T2 = b (mod 2k) are satisfied; if a and b have different parities, then c and b must 
have the same parity, and there exists at least a pair of (Ti,T2) with 1 < Ti < r2 < i such that 
the second set of equations above is satisfied. Either of the two cases occurs with a probabihty 
^{1- iy~^. Using the fact that (1 - i)^ > 1/4 for A; > 2, at t = 6A; we get 



Pr{st = s}> 



4fe2 



1 



t-2 



-12 



> 



4fc2 • 



(52) 



Xt 



(53) 



2) s is a vertical state. We show that in this case it is sufficient to consider the case of At = 3. 
Similarly as above, a vertical state s is fully characterized by x and z. Note that Xt is only determined 
by the direction of the second turn. Similar to (50) and (51) two possible values for xt are given 
by 

' g{zo + T1-T2 + TS-I (mod 2A;)) 
^ g{-zo -T1-T2 + T3 (mod 2k)). 
Also Zt is only determined by the direction of the first turn and third turn. It can be shown that 
the four possible values of zt are given by 

yo + t-Ti+T2-T3 + l (mod 2k) 
-yo + t + Ti-T2- T3 (mod 2k) 
-yo + t-Ti+T2- T3 (mod 2A;) 
j/o + 1 + Ti - T2 - T3 + 1 (mod 2k). 



Zt 



(54) 
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Therefore, 

Prist = s} > Prist = s, At = 3} 
> Pr{zo + Ti-T2 + n-l = x (mod 2k), yo + t + Ti - T2 - n + 1 = z (mod 2k), At = 3} 

+ Pr{zo + Ti-T2 + T3-l = x (mod 2A;), - yo + t + Ti - T2 - T3 = z (mod 2A:), At = 3} 
= Pr{r3 -{T2-Ti) = a (mod 2A:), + (Ts - Ti) = 6 (mod 2fc), = 3} (55) 

+ Pr{T3 - {T2 -Ti) = a (mod 2k), T3 + {T2 -Ti) = c (mod 2k), At = 3}, (56) 

where the second inequality comes from picking two combinations out of eight possible combina- 
tions formed from (53) and (54), and in the last inequahty, we have substituted a = x — zq + 1, 
b = yo + t — z+1 and c = —yo + t — z. Same as 1), we must consider two cases on parity. For a 
and h with the same parity, consider the 2k triplets of {Ti,T2,T^) given by 

(t]_, - 1 (mod 2k) + l + Ti, - 1 (mod 2k) + 1 + Ak^ , Ti = 1, 2, • • • 2k. 

It is obvious that any such triplet satisfies 1 < 71 < T2 < < Qk, as well as the conditions 
in (55). For a and h with different parity, a and c must have the same parity, and similarly there 

exists at least 2k valid triplets of (Ti,T2,T3) satisfying the conditions in (56). Thus, for any target 
vertical state s, we can always find 2k turning times (ri,r2,T3) with proper turning directions to 
reach s at i = 6fc with probabihty 

P,{., = ,,>2*.^(l--j (57) 

This completes the proof. 
B. Proof of Lemma 5.2 

Assume the unit square is coordinated by {x, y) with x,y ^ [0, 1], starting from the south-west comer. 
Denote the state space of the chain Pi by S. A state s G 5 is represented with a triplet s = {x, y, I) 
following the grid case in Appendix A. Define an auxihary parameter z for a state s as follows: 





X 


I 


= E 


2 


— X 


I 


= w 




y 


I 


= N 


2 


-y 


I 


= s. 



We will show that by the time t = 6k + 1, for any state s E S, Prist = s} > ciir{s) for some positive 
constant ci. 
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Consider a movement of the random walk. Denote the distance traveled in the direction of movement, 
and that orthogonal to the direction of movement at time t respectively by at and /3t, as shown in Fig. 
13. Since nodes are randomly and uniformly distributed and the transition probability is uniform for all 
neighbors in the same direction, we can calculate the expected value of at and /?< (with respect to the 
node distribution) as follows: 

"^^^ r 2 . , 4^/2 ^ 



Efctf) = — :r / / COS 9 dx d6 = — r = iia-, (58) 
T^r^ J-n/AJo 

mpt) = ^ / x^smedxde = 0. 



(59) 



t/4 

Similarly, their second-order moments can be readily computed as 



E(a^) = — 2 / / x^cos^e dx dO = (60) 



Trr^ 7_^/4 Jo 47r 

f7r/4 i-r „ fn 

t/4 

and the variances of at and f3t are given by 

(iT + V2 32 \ 



E(/?2) = ^ / x^sin^e dx dO = , r^ (61) 



47r Qtt^ / 



^ (62) 



4 /•'"/■^ /"'■■ o 

E((at - Ma) A) = ^Mt) = ^ x^cosO sin 9 dx dO = 0. (64) 

Trr^ 7_^/4 Jo 



Tr-V2 2 A 2 /^Qx 

= a^. (63) 

Note that at and /3t are uncorrelated, i.e., 

4 /•'r/4 

-7r/4 .70 

In the following, we assume ^ = and the turning probabiUty p = ^ = Q{r). 

Without loss of generality, we assume that the random walk starts from some arbitrary horizontal state 
So = (^^O) yo, Iq) with Iq G {E, W}, yo = a^iXa for some ao G {0, 1, • • • , — 1} and the corresponding 
zo = i>Q^a for some 6o G {0, 1 • • • , 2fc — 1}.'^ Similar to Appendix A, we need to consider two cases: 
the target state s being a horizontal state and the target state s being a vertical state. In the following, 
we will focus on the the former case, and the proof for the latter case is similar. 

First consider the expected location E(st) of the random walk at t. It depends only on the turning times 
and turning directions, and evolves according to the random walk P on the /c x A; grid (see Section IV) 

"Recall that a horizontal node is completely characterized by y and z. The proof is essentially the same for non-integer ao 



and ho, with a little more complicated notation. 
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Fig. 13. Illustration of moving distances and target set 



Thus, according to Appendix A, at t = Qk, for any a' G {0, 1, • • • , A: — 1} and 5' G {0, 1, • • • ,2k — 1}, 
we have 

Pr{E(2/t) = a' Ha, H^t) = b' i^a} > Pr{E(yt) = a'/Xa, H^t) = b'na, At = 2} > ^ (65) 
for some C2 > 0. 

In order to obtain a lower bound for the probability of reaching a target horizontal state s at t = 6k + 1, 
we first obtain a lower bound for the probability of reaching any ancestor of s in the underlying graph 
of the chain at t = 6k. For example, consider an east state s of node z as in Fig. 13. Note that the 
effective west neighboring region of node i covers a circular sector of 90 degrees (for boundary nodes 
virtual neighbors are considered). It can be shown that such a circular sector contains a square of side 
fia as depicted in Fig. 13 (for boundary nodes the corresponding square is folded along the boundary). 
Denote the set of east states in Af^ [J Mf and west states in J\ff in this square by »S = {s : y G 
y, z G / G {E, W}}, where generally for a non-boundary node, we have Y = [ajia, {a + and 
Z = [bHa, {h + l)iia) for some a G [0, A; — 2] and h G [0, 2k — 2], and / = / (the direction of the target 
state)'^. In the following, we assume i is not a boundary node for simplicity, but the proof extends easily 

'*If p = I for some positive c 7^ 1, then the expected location would evolve according to another chain which differs from 
P only in the turning probability, and has the same scaling law in the mixing time as P. 

"in the above example, if i is a west boundary node, then the square under consideration is folded along the west boundary, 
such that Z = [0, (1 — 6)/Ua) |J[2 — Va, 2) for some b € (0, 1), with the latter corresponding to west states of nodes in Nf. 
Note that in all cases, both Y and Z consist of intervals with a total length /Lt„. 
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to the boundary nodes. 
We claim that att = 6k, 

k-l 2k-l 

^ ^ Pr {st G <S I E{yt) = a'fia, T^{zt) = b'na, = 2} > c' 

a'=0 b'=0 

for some constant c' w.h.p. Based on this result and (65), we have att = 6k, 

fc-l 2fc-l 

Pr{st eS} > ^ ^ Pr |st G 5 I E{yt) = a'/Xa, H^t) = b'na, At = 2} 



(66) 



a'=0 b'=0 



C'C2 



(67) 



By Lemma 5.1, when r > \f^^^, dmax — maxj ; d\ < c^nr^ for some constant C3 > w.h.p., thus 
we have 

1 



Pr{s6fe+i = s}> 



^ Prjggfc = s} ^ 1/2 c'c2 A £4_ 



2^ 

ses 



c^nr^ Ak"^ An 



(68) 



Note that, the random walk P has a uniform stationary distribution on the A; x A; grid. Using the 
argument as above, it can be shown that for any set S containing states of the same type in a square of 
side /Xq,, the stationary probability of Pi satisfies it{S) = and consequently the stationary probability 
of any state of Pi is lower bounded by ^ for some C5 > (c.f.(68)). For an upper bound, note that 
in Fig. 13 the effective west neighboring region of i is also contained in an area A consisting of 2 x 3 
squares of side i^a- Let S^, and respectively denote the set of east states^**, the set of north states 
and the set of south states of (physical and virtual) west neighbors of i that lie in A. By Lemma 5.1, 
when r > <^min — minj ^ > CQur^ w.h.p. Hence for any state s. 



7r(s) <{l-p) 



7r(£) 



< 



(l-p) + |-2 



cenr^ 4fc^ An 



We conclude that the stationary distribution of Pi is approximately uniform, i.e., for any s e S, 

1^ < 7r(s) < 1^ for some 05,07 > 0. It follows from (68) that Prjsgfc+i = s} > ^vr(s) = ci7r(s) w.h.p., 
which implies that the fill time of Pi is Tfiu(Pi, e) = 0(r^^) w.h.p. 

We are left to verify the claim (66). It is sufficient to consider the case that the random walk makes 
two turns in first 6k steps, with the turning times Ti and T2. Denote the distance vector traveled at the 
tth step by 

^^^1 [«t Ptf te[l,T^)U[T2,6k] ^^^^ 
[Pt atf te[Ti,T2), 



*For nodes in A/?, their west states are considered instead. 
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with mean 



E(At) ^ /XA = < 



K 0]' te[l,Ti)U[T2,6k] 

T 



^ [0 Ha] te[Ti,T2), 

and covariance matrix (note at and Pt are uncorrelated) 



(71) 










< 






















t G [l,ri)U[T2,6fe] 

[Ti,r2). 



(72) 



As the distance vectors in different steps are independent, the covariance matrix of the total distance 
vector A = Ylt=i given by 

alirr rr 



a\Ti,T2 





a 



where 



and 



<^l\T,,n = [Ti + i^k - T2)]al + {T2 - T^)al = {a} - al){T2 - Ti) + Qkal 



<^m,T2 = [Ti + {Qk - T2)]a} + {T2 - n)al = {al - a}){T2 - T,) + 6ka] 



(73) 



(74) 



(75) 



are the respective variance of the total distance traveled horizontally and vertically in Qk steps. As 
(7| > (7^, it is easy to verify that the maximum of f^i^^^ ^2 ^"piTi Ta (with respect to Ti and T2) are 
the same: 



al + {6k-l)a}. 



(76) 



Let 



A, 



k,t 



{11) 



{at - lia)/cFa\T^,T2 
Pt/<^f3\Ti,T2 

A/'7q|Ti,T2 

_ (at - ^q)/c^/3|Ti,T2 

we have E(Afc,t) = and lim„^oo Ylt=i ^(^fc,tA^^) = I, where I is the 2 x 2 identity matrix. In 
addition, by defining E(y; C) = E(yic) with Iq being the indicator function of C, for any e > 



t G [l,Ti)U[r2,6A:] 
t&[Ti,T2), 



lim y2E{\Ak,tf;\Ak,t\>e) = 0, 



(78) 



DRAFT 



SUBMITTED TO IEEE TRANS. INFORM. THEORY. 



39 



since \Ak,t\ is always less than e when n is sufficiently large such that maxjcr it ggiT r } ^ 
Then according to the multivariate Lindeberg-Feller Theorem ( [26] Proposition 2.27), the conditional 
probability density function (PDF) of 



6A; 



6k 



{zek-Hz6k))/(^a\Tr,T2 
(?/6fc-]E(?/6fc))/cr/3|Ti,T2 



(79) 



t=l t=l 

given Ti and T2 converges in distribution to the standard multivariate normal distribution M{0,I). 

Suppose 'T^a',b'} is the set of turning times combination that result in E(2;j) = b'jjLa, ^{ut) = a'l^ar 
and 

{T'i,{a',&'},T2,{a',b'}} = argmini^^ 7.^1^7-^^, Prjzt e [bfia, {b + l)na),yt & [ana,{a + l)Ha) \ Ti,T2} 
for any a G [0, A; - 2] and b G [0, 2k - 2]. Define 

U{X; A, E) = exp{-l(X - Afj^-^X - A)} 



as the PDF value of the multivariate normal distribution A^(A, E) at X, and (c.f. (73)) 
Then for any a G [0, A; — 2] and b G [0, 2A; — 2], we can always find a matrix (c.f. (76)) 



< 

al 

Po 



satisfying 



, < min 

27rV|So| a'=0,...,fe-l,6'=0,l,.-,2fc- 



^ ]n'{a',6'}([^/^a a/^a] ), 

n{a',6'}([(^+l)Ma aHaf), 

nK,6'}([(^ + l)Ma (a + l)Man} 



(80) 



This allows us to define an auxiliary normal distribution with an arbitrary mean and covariance matrix 
Eo whose maximal PDF value is less than the minimum PDF values of all Prjzefcjyefe I ^i^ek) = 

''which determine E(at) and E(j/t) (for fixed turning directions), but not vice versa. There may exist multiple combinations 
of {Ti,T2} which can result in the same {E{zt),'E{yt)}. 
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feVai ^{yek) = o,'fia,AQk = 2} (af = 0, k — 1, b' = 0, 2k — 1) in the square {s : z G [6//^, {b + 
l)fia],y G [afJ'a, (o + 1)/Lta]}- Therefore, as n ^ oo, 

fe-l 2A;-1 

^ ^ Pr {yek G [a//a, (a + !)//«), ^6fc G {b + | lE(?/6fc) = a /Xa, lE(26fc) = &Va, ^6fc = 2} 

a'=0 b'=0 

k-l 2k-l „(a+l)noc /•(6+l)/Ua 1 



a'=0 b'=0 •^"'^o 



{yt - a'naf {zt - b'naf 



exp 



20-2 20-2 



fc-1 2fc-l ^(a+l)^„ M+l)iic 



> 



a'=0 b'=0 "^"Z*" •^''1 



a'=0 )''° 



1 



{yt-a'na? {zt-b'ixaf , 
■ exp ^ — }dztdyt 



2al 

Po 



2< 



27ro- 



■ exp 



Po 



yt 



2a 



2k-l 

2 r^y* 



00 



b'=0 ''i>-b')^^c 



27ro-, 



■ exp 



dzt 



. E / ° 



eM-:jzr}dyt E / 



27rcr 



/3o 



l3o 



2k" 



ao 



fe-2 

E 



// n'2„2 2'=-2 ,2, ,2 

exp{-^} }^ -^=_exp{-^}, 



2% ' t^i V2Traao 



(81) 



a'=l v27rC7/3„ 

where the first inequaUty is based on the definition of {Ti^^a',b'}:T2^{a',b'}}' the second one comes 
from (80). Noting that Ha/o'ao /^a/c^/3o scale as 6(-v/r), while kjia/crao kjia/cypo go to 00 as 
n — 00, the last line in (81) converges to 

/•oo x'^ f°° 1 y2 

Jo V^'^'^'^-Y^'' Jo 71^^^P^-T^'^ = '/'' 

which concludes the proof. 



C. LADA-U Algorithm 

In this appendix, we introduce the LADA-U (Uniform) algorithm, which achieves the goal of distributed 
averaging by simulating a nonreversible chain with uniform stationary distribution on the geometric 
random graph. In LADA-U, each node i holds four values y', Z = 0, • • • , 3 corresponding to the four 
directions, all iiutiaUzed to Xi{0). During each iteration, the east value of node i is updated with 



= (I-P) 



jeNf U 



+\p{yl{t) + yKt)) 



E f^.E^-i^ 



dn 



d 



yUt) 
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where dmax = niaxj ^ d\, and p = B(r) is defined similarly as in LADA. Note that the boundary effect 
have been addressed through virtual neighbors as in LADA. The north, west and south values are updated 
in the same fashion. Node i computes its estimate of Xavc with Xi{t + ^) = \ Yl^=oy\{'^ + !)• 

We then give some performance analysis for LADA-U. Denote y as in LADA, the iteration can be 
written as y(t + 1) = P^y(t), where P2 is a doubly stochastic matrix through our design. The exchange 
weights for an east value of some node i are illustrated in Fig. 14: a fraction | of the east value goes to 
the north and south value of the same node respectively, a total fraction of — p) goes uniformly 

to the east values of east neighbors, and the remaining — TT^^ ~ P) go^^ to the west value of 
node i. The transitions between the east and west state make up for the difference in and df, and 
ensures that the incoming probabilities for each state also sum to 1. While such a design guarantees 
that the associated chain has a uniform stationary distribution, it also introduces some diffusive behavior, 
hence the centralized performance can only be achieved with a larger r. In the following, we show that 
for LADA-U, Tavc(e) = 0(r~Mog(e"^)) when the transmission radius r = ^(^^) ''^ with high 
probability. 

It can be shown that the expected location of the random walk P2 evolves according to a random walk 
P' on the k X k grid, where A; = ^ + 1 as defined in Appendix B. P' differs from P used in Section 
IV in two aspects: 1) there are additional probabilities of moving between states of opposite directions 
corresponding to the same node; 2) a 90 degree turn is towards a state corresponding to the same node 
instead of the next node in the turning direction. Recall that from Lemma 5.1, when r = 17 
we have d[ = ^^^(1 ib 0(1)) for all i and / w.h.p. Thus for each move, the probability that the random 
walk P' keeps the direction is at least (1 — p'j^ssiis- = (^i — i/k){l — 0{r)) > 1 — ^ for some constant 
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ci > 1 w.h.p. During the first 6k moves, the probabiUty that the random walk P' makes exactly two 90 
degree turns towards given directions at given times Ti and T2, and keeps direction for the remaining 
moves is at least ^ ^ — TF^" Then, following the argument in Appendix A, if the random 

walk P' starts from an east or west state, any east or west state can be reached with probabiUty at least 
^-^P- in 6A; steps (note that the modification in the 90 degree turns only causes constant shifts in the 
expressions of sj, and does not affect the result). The case for north and south states can be similarly 
argued, and we conclude that the state distribution of the random walk P' is approximately uiuform at 
t = 6k w.h.p. Then, following the analysis in Appendix B, it can be shown that the exact location of 
random walk P2 is also approximately uniform at t = 6k, which by the uniformity of the stationary 
distribution of P2 implies that the e-mixing time of P2, as well as the e-averaging time of LADA-U is 
0(r~^ log(e~^)) w.h.p. 

D. Distributed Clustering 

We assume each node i has an iiutial seed Si which is uiuque within its neighborhood. This can be 
reaUzed through, e.g., drawing a random number from a large common pool, or simply using nodes' IDs. 
From time 0, each node i starts a timer with length ti = Si, which is decremented by 1 at each time 
instant as long as it is greater than 0. If node i's timer expires (reaches 0), it becomes a cluster-head, and 
broadcasts a "cluster Jnitialize" message to all its neighbors. Each of its neighbors with a timer greater 
than signals its intention to join the cluster by replying with a "cluster_join" message, and also sets the 
timer to 0. If a node receives more than one "cluster initialize" messages at the same time, it randomly 
chooses one cluster-head and replies with the "clustery oin" message. At the end, clusters are formed such 
that every node belongs to one and only one cluster. The uiuqueness of seeds within the neighborhood 
ensures that cluster-heads are at least of distance r from each other. We assume that clusters are formed 
in advance and the overhead is amortized over the multiple computations. The detailed algorithm is given 
in Algorithm 4. 
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Algorithm 4 Distributed Clustering 
K <=0 {K: number of clusters} 

for all z G F do 

end for 
repeat 

for all i with tj > do 

ti<^ti-l 
if tj = then 

K K + I, Ck ^ {i} {Ck- nodes in cluster A;} 

for all j G Ni and with tj > do 

end for 
end if 
end for 
until UfcC'fe = y 
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